Non-Compositional Language Model and Pattern Dictionary Development for Japanese Compound and Complex Sentences

نویسندگان

  • Satoru Ikehara
  • Masato Tokuhisa
  • Jin'ichi Murakami
چکیده

To realize high quality machine translation, we proposed a Non-Compositional Language Model, and developed a sentence pattern dictionary of 226,800 pattern pairs for Japanese compound and complex sentences consisting of 2 or 3 clauses. In pattern generation from a parallel corpus, Compositional Constituents that could be generalized were 74% of independent words, 24% of phrases and only 15% of clauses. This means that in Japanese-to-English MT, most of the translation results as shown in the parallel corpus could not be obtained by methods based on Compositional Semantics. This dictionary achieved a syntactic coverage of 98% and a semantic coverage of 78%. It will substantially improve translation quality.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Pattern Dictionary Development Based on Non-compositional Language Model for Japanese Compound and Complex Sentences

A large-scale sentence pattern dictionary (SP-dictionary) for Japanese compound and complex sentences has been developed. The dictionary has been compiled based on the non-compositional language model. Sentences with 2 or 3 predicates are extracted from a Japanese-to-English parallel corpus of 1 million sentences, and the compositional constituents contained within them are generalized to produ...

متن کامل

Development of Semantic Pattern Dictionary for Non-linear Structures of Complex and Compound Sentences

has been compiled on Semantically Classified Sentence Pattern Dictionary Semantic Typology Analogical Mapping the basis of in order to develop an for MT. This dictionary includes 221,563 which Method Semantic Patterns have been generated from Japanese compound and complex sentences. The patterns have been made up in the semi-automatic manner using a set of variables (of full words) and function...

متن کامل

Analogical Mapping Method and Semantic Categorization of Japanese Compound and Complex Sentence Patterns

To overcome the limit of the conventional machine translation (MT) method based on compositional semantics, we proposed an Analogical Mapping (AM) method based on Semantic Typology and built a semantic category system for Japanese compound and complex sentences. The AM-method maps linguistic expressions into other expressions with the same meaning with semantic categorization (based on concepts...

متن کامل

Stress Pattern System in Central Sarawani Balochi

The present article investigates the stress pattern system of Central Sarawani Balochi (CSB), spoken in Sarawan located in Sistan and Baluchestan province of the Islamic Republic of Iran, based on metrical theory as developed in Hayes (1995). Correspondingly, the present research illustrates the position of primary and secondary stress in mono-morphemic words, verbal paradigms, compound words, ...

متن کامل

The Role of Non-Linguistic Variables in Production of Complex Linguistic Structures by Hearing-Impaired Children

Objectives: Language development is often very slower in hearing impaired children compared with their normal peers. Hearing impairment during childhood affects all aspects of speech production and language acquisition. It seems that hearing impaired people suffer from language and speech impairments such as production of complex linguistic structures. The purpose of this study is to determine ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008